62 research outputs found

    The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species

    Get PDF
    We have greatly expanded the Alternative Splicing Annotation Project (ASAP) database: (i) its human alternative splicing data are expanded ∼3-fold over the previous ASAP database, to nearly 90 000 distinct alternative splicing events; (ii) it now provides genome-wide alternative splicing analyses for 15 vertebrate, insect and other animal species; (iii) it provides comprehensive comparative genomics information for comparing alternative splicing and splice site conservation across 17 aligned genomes, based on UCSC multigenome alignments; (iv) it provides an ∼2- to 3-fold expansion in detection of tissue-specific alternative splicing events, and of cancer versus normal specific alternative splicing events. We have also constructed a novel database linking orthologous exons and orthologous introns between genomes, based on multigenome alignment of 17 animal species. It can be a valuable resource for studies of gene structure evolution. ASAP II provides a new web interface enabling more detailed exploration of the data, and integrating comparative genomics information with alternative splicing data. We provide a set of tools for advanced data-mining of ASAP II with Pygr (the Python Graph Database Framework for Bioinformatics) including powerful features such as graph query, multigenome alignment query, etc. ASAP II is available at

    Causal graph-based analysis of genome-wide association data in rheumatoid arthritis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>GWAS owe their popularity to the expectation that they will make a major impact on diagnosis, prognosis and management of disease by uncovering genetics underlying clinical phenotypes. The dominant paradigm in GWAS data analysis so far consists of extensive reliance on methods that emphasize contribution of individual SNPs to statistical association with phenotypes. Multivariate methods, however, can extract more information by considering associations of multiple SNPs simultaneously. Recent advances in other genomics domains pinpoint multivariate causal graph-based inference as a promising principled analysis framework for high-throughput data. Designed to discover biomarkers in the local causal pathway of the phenotype, these methods lead to accurate and highly parsimonious multivariate predictive models. In this paper, we investigate the applicability of causal graph-based method TIE* to analysis of GWAS data. To test the utility of TIE*, we focus on anti-CCP positive rheumatoid arthritis (RA) GWAS datasets, where there is a general consensus in the community about the major genetic determinants of the disease.</p> <p>Results</p> <p>Application of TIE* to the North American Rheumatoid Arthritis Cohort (NARAC) GWAS data results in six SNPs, mostly from the MHC locus. Using these SNPs we develop two predictive models that can classify cases and disease-free controls with an accuracy of 0.81 area under the ROC curve, as verified in independent testing data from the same cohort. The predictive performance of these models generalizes reasonably well to Swedish subjects from the closely related but not identical Epidemiological Investigation of Rheumatoid Arthritis (EIRA) cohort with 0.71-0.78 area under the ROC curve. Moreover, the SNPs identified by the TIE* method render many other previously known SNP associations conditionally independent of the phenotype.</p> <p>Conclusions</p> <p>Our experiments demonstrate that application of TIE* captures maximum amount of genetic information about RA in the data and recapitulates the major consensus findings about the genetic factors of this disease. In addition, TIE* yields reproducible markers and signatures of RA. This suggests that principled multivariate causal and predictive framework for GWAS analysis empowers the community with a new tool for high-quality and more efficient discovery.</p> <p>Reviewers</p> <p>This article was reviewed by Prof. Anthony Almudevar, Dr. Eugene V. Koonin, and Prof. Marianthi Markatou.</p

    OHMI: The Ontology of Host-Microbiome Interactions

    Get PDF
    Host-microbiome interactions (HMIs) are critical for the modulation of biological processes and are associated with several diseases, and extensive HMI studies have generated large amounts of data. We propose that the logical representation of the knowledge derived from these data and the standardized representation of experimental variables and processes can foster integration of data and reproducibility of experiments and thereby further HMI knowledge discovery. A community-based Ontology of Host-Microbiome Interactions (OHMI) was developed following the OBO Foundry principles. OHMI leverages established ontologies to create logically structured representations of microbiomes, microbial taxonomy, host species, host anatomical entities, and HMIs under different conditions and associated study protocols and types of data analysis and experimental results

    Strategic Applications of Gene Expression: From Drug Discovery/Development to Bedside

    Get PDF
    ABSTRACT. Gene expression is useful for identifying the molecular signature of a disease and for correlating a pharmacodynamic marker with the dose-dependent cellular responses to exposure of a drug. Gene expression offers utility to guide drug discovery by illustrating engagement of the desired cellular pathways/networks, as well as avoidance of acting on the toxicological pathways. Successful employment of gene-expression signatures in the later stages of drug development depends on their linkage to clinically meaningful phenotypic characteristics and requires a biologically meaningful mechanism combined with a stringent statistical rigor. Much of the success in clinical drug development is hinged on predefining the signature genes for their fitness for purposes of application. Specific examples are highlighted to illustrate the breadth and depth of the potential utility of gene-expression signatures in drug discovery and clinical development to targeted therapeutics at the bedside

    Simrank: Rapid and sensitive general-purpose k-mer search tool

    Get PDF
    Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project (http://nihroadmap.nih.gov/hmp). Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available. Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset. Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity

    A comprehensive evaluation of multicategory classification methods for microbiomic data

    Get PDF
    BACKGROUND: Recent advances in next-generation DNA sequencing enable rapid high-throughput quantitation of microbial community composition in human samples, opening up a new field of microbiomics. One of the promises of this field is linking abundances of microbial taxa to phenotypic and physiological states, which can inform development of new diagnostic, personalized medicine, and forensic modalities. Prior research has demonstrated the feasibility of applying machine learning methods to perform body site and subject classification with microbiomic data. However, it is currently unknown which classifiers perform best among the many available alternatives for classification with microbiomic data. RESULTS: In this work, we performed a systematic comparison of 18 major classification methods, 5 feature selection methods, and 2 accuracy metrics using 8 datasets spanning 1,802 human samples and various classification tasks: body site and subject classification and diagnosis. CONCLUSIONS: We found that random forests, support vector machines, kernel ridge regression, and Bayesian logistic regression with Laplace priors are the most effective machine learning techniques for performing accurate classification from these microbiomic data

    Enrichment of lung microbiome with supraglottic taxa is associated with increased pulmonary inflammation

    Get PDF
    BACKGROUND: The lung microbiome of healthy individuals frequently harbors oral organisms. Despite evidence that microaspiration is commonly associated with smoking-related lung diseases, the effects of lung microbiome enrichment with upper airway taxa on inflammation has not been studied. We hypothesize that the presence of oral microorganisms in the lung microbiome is associated with enhanced pulmonary inflammation. To test this, we sampled bronchoalveolar lavage (BAL) from the lower airways of 29 asymptomatic subjects (nine never-smokers, 14 former-smokers, and six current-smokers). We quantified, amplified, and sequenced 16S rRNA genes from BAL samples by qPCR and 454 sequencing. Pulmonary inflammation was assessed by exhaled nitric oxide (eNO), BAL lymphocytes, and neutrophils. RESULTS: BAL had lower total 16S than supraglottic samples and higher than saline background. Bacterial communities in the lower airway clustered in two distinct groups that we designated as pneumotypes. The rRNA gene concentration and microbial community of the first pneumotype was similar to that of the saline background. The second pneumotype had higher rRNA gene concentration and higher relative abundance of supraglottic-characteristic taxa (SCT), such as Veillonella and Prevotella, and we called it pneumotype(SCT). Smoking had no effect on pneumotype allocation, α, or β diversity. Pneumotype(SCT) was associated with higher BAL lymphocyte-count (P= 0.007), BAL neutrophil-count (P= 0.034), and eNO (P= 0.022). CONCLUSION: A pneumotype with high relative abundance of supraglottic-characteristic taxa is associated with enhanced subclinical lung inflammation
    corecore